183 research outputs found

    Contour regression: A general approach to dimension reduction

    Full text link
    We propose a novel approach to sufficient dimension reduction in regression, based on estimating contour directions of small variation in the response. These directions span the orthogonal complement of the minimal space relevant for the regression and can be extracted according to two measures of variation in the response, leading to simple and general contour regression (SCR and GCR) methodology. In comparison with existing sufficient dimension reduction techniques, this contour-based methodology guarantees exhaustive estimation of the central subspace under ellipticity of the predictor distribution and mild additional assumptions, while maintaining \sqrtn-consistency and computational ease. Moreover, it proves robust to departures from ellipticity. We establish population properties for both SCR and GCR, and asymptotic properties for SCR. Simulations to compare performance with that of standard techniques such as ordinary least squares, sliced inverse regression, principal Hessian directions and sliced average variance estimation confirm the advantages anticipated by the theoretical analyses. We demonstrate the use of contour-based methods on a data set concerning soil evaporation.Comment: Published at http://dx.doi.org/10.1214/009053605000000192 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Modeling a Decentralized Asset Market: An Introduction to the Financial "Toy-Room"

    Get PDF
    In this paper, we describe a micro-founded simulation enviroment for decentralized trade in financial asset. Within the philosophy of computer simulated "artificial markets", this enviroments allows one to experiment in a modular fashion with (i) individual characterizations in terms of behaviors and learning, (ii) different architectural and institutional traits of the market, and (iii) time-embedding of events at the system and the individual level.-

    Probabilistic KK-mean with local alignment for clustering and motif discovery in functional data

    Full text link
    We develop a new method to locally cluster curves and discover functional motifs, i.e.~typical ``shapes'' that may recur several times along and across the curves capturing important local characteristics. In order to identify these shared curve portions, our method leverages ideas from functional data analysis (joint clustering and alignment of curves), bioinformatics (local alignment through the extension of high similarity seeds) and fuzzy clustering (curves belonging to more than one cluster, if they contain more than one typical ``shape''). It can employ various dissimilarity measures and incorporate derivatives in the discovery process, thus exploiting complex facets of shapes. We demonstrate the performance of our method with an extensive simulation study, and show how it generalizes other clustering methods for functional data. Finally, we provide real data applications to Berkeley growth data, Italian Covid-19 death curves and ``Omics'' data related to mutagenesis.Comment: 22 pages, 6 figures. This work has been presented at various conference

    Composite likelihood inference in a discrete latent variable model for two-way "clustering-by-segmentation" problems

    Full text link
    We consider a discrete latent variable model for two-way data arrays, which allows one to simultaneously produce clusters along one of the data dimensions (e.g. exchangeable observational units or features) and contiguous groups, or segments, along the other (e.g. consecutively ordered times or locations). The model relies on a hidden Markov structure but, given its complexity, cannot be estimated by full maximum likelihood. We therefore introduce composite likelihood methodology based on considering different subsets of the data. The proposed approach is illustrated by simulation, and with an application to genomic data

    On the impact of serial dependence on penalized regression methods

    Full text link
    This paper characterizes the impact of covariates serial dependence on the non-asymptotic estimation error bound of penalized regressions (PRs). Focusing on the direct relationship between the degree of cross-correlation of covariates and the estimation error bound of PRs, we show that orthogonal or weakly cross-correlated stationary AR processes can exhibit high spurious correlations caused by serial dependence. In this respect, we study analytically the density of sample cross-correlations in the case of two orthogonal Gaussian AR(1) processes. Our results are validated by an extensive simulation study. Furthermore, we introduce a new procedure to remedy spurious correlations in a time series regime, applying PRs to pre-whitened (ARMA filter) time series. We show that under mild assumptions our procedure allows both to reduce the estimation error and to develop an effective forecasting strategy. The estimation accuracy of our proposal is validated by means of simulations and an empirical application based on a large monthly macroeconomic data relative to the Euro Area economy

    A general theory for nonlinear sufficient dimension reduction: Formulation and estimation

    Full text link
    In this paper we introduce a general theory for nonlinear sufficient dimension reduction, and explore its ramifications and scope. This theory subsumes recent work employing reproducing kernel Hilbert spaces, and reveals many parallels between linear and nonlinear sufficient dimension reduction. Using these parallels we analyze the properties of existing methods and develop new ones. We begin by characterizing dimension reduction at the general level of σ\sigma-fields and proceed to that of classes of functions, leading to the notions of sufficient, complete and central dimension reduction classes. We show that, when it exists, the complete and sufficient class coincides with the central class, and can be unbiasedly and exhaustively estimated by a generalized sliced inverse regression estimator (GSIR). When completeness does not hold, this estimator captures only part of the central class. However, in these cases we show that a generalized sliced average variance estimator (GSAVE) can capture a larger portion of the class. Both estimators require no numerical optimization because they can be computed by spectral decomposition of linear operators. Finally, we compare our estimators with existing methods by simulation and on actual data sets.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1071 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore